Arabic Entity Graph Extraction Using Morphology, Finite State Machines, and Graph Transformations

نویسندگان

  • Jad Makhlouta
  • Fadi A. Zaraket
  • Hamza Harkous
چکیده

Research on automatic recognition of named entities from Arabic text uses techniques that work well for the Latin based languages such as local grammars, statistical learning models, pattern matching, and rule-based techniques. These techniques boost their results by using application specific corpora, parallel language corpora, and morphological stemming analysis. We propose a method for extracting entities, events, and relations amongst them from Arabic text using a hierarchy of finite state machines driven by morphological features such as part of speech and gloss tags, and graph transformation algorithms. We evaluated our method on two natural language processing applications. We automated the extraction of narrators and narrator relations from several corpora of Islamic narration books. We automated the extraction of genealogical family trees from Biblical texts. In all applications, our method reports high precision and recall and learns lemmas about phrases that improve results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Cross-Document NLP for the Hadith and Biography Literature

Recently cross-document integration and reconciliation of extracted information became of interest to researchers in Arabic natural language processing. Given a set of documents A, we use Arabic morphological analysis, finite state machines, and graph transformations to extract named entities Na and relations Ra expressed as edges in a graph G = 〈Na, Ra〉. We use the same techniques to extract e...

متن کامل

MERF: Morphology-based Entity and Relational Entity Extraction Framework for Arabic

Rule-based techniques and tools to extract entities and relational entities from documents allow users to specify desired entities using natural language questions, finite state automata, regular expressions, structured query language statements, or proprietary scripts. These techniques and tools require expertise in linguistics and programming and lack support of Arabic morphological analysis ...

متن کامل

تشخیص اسامی اشخاص با استفاده از تزریق کلمه‌های نامزد اسم در میدان‌های تصادفی شرطی برای زبان عربی

Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

From UML 2 Sequence Diagrams to State Machines by Graph Transformation

Algebraic graph transformation has been promoted by several authors as a means to specify model transformations. This paper explores how we can specify graph transformation-based rules for a classical problem of transforming from sequence diagrams to state machines. The specification of the transformation rules is based on the concrete syntax of sequence diagrams and state machines. We introduc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012